Style-Markers in Authorship Attribution A Cross-Language Study of the Authorial Fingerprint

نویسنده

  • Maciej Eder
چکیده

Th e present study addresses one of the theoretical problems of computer-assisted authorship attribution, namely the question which traceable features of language can betray authorial uniqueness (a stylistic fi ngerprint) of literary texts. A number of recent approaches show that apart from lexical measures — especially those relying on the frequencies of the most frequent words — also some other features of written language are considerably eff ective as discriminators of authorial style. However, there have been no attempts to compare the attribution potential of these features. Th e aim of the present study, then, was to examine the eff ectiveness of several style-markers in authorship attribution. Th e style-markers chosen for the empirical investigation are those that can be retrieved from a non-lemmatized corpus of plain text fi les, such as the most frequent words, word bi-grams, diff erent letter sequences, and markers of diff erent nature, combined in one sample. Equally important, however, was to compare usefulness of the chosen style-markers across a few languages: English, Polish, German, and Latin. Th e results confi rmed a high attribution eff ectiveness of word-based style-markers in the English corpus, but the alternative markers are shown to be usually more eff ective in the other languages.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stripped of Authorship or Projected Identity? Iranian Scholars’ Presence in Research Articles

Research Article (RA) genre has been a significant area of research in academic writing over past decades. However, authors’ identity in RAs has not received much attention, especially in soft sciences like applied linguistics. This paper reports a corpus analysis of Iranian writers’ authorial presence markers in RAs in the field of applied linguistics. The corpus comprised 30 RAs (200,000 word...

متن کامل

Questioned Electronic Documents : Empirical Studies in Authorship Attribution

Forensic analysis of questioned electronic documents is very difficult, because the nature of the documents eliminates many kinds of informative differences. Recent work in authorship attribution demonstrates the practicality of analyzing documents based on authorial style, but the state of the art is confusing. Analyses are difficult to apply, little is known about type or rate of errors, and ...

متن کامل

Short samples in authorship attribution: a new approach

The question of minimal sample size is one of the most important issues in stylometry and nontraditional authorship attribution. In the last decade or so, a few studies concerning different aspects of scalability in stylometry have been published (Zhao and Zobel, 2005; Hirst and Feiguina, 2007; Stamatatos, 2008; Koppel et al., 2009; Mikros, 2009; Luyckx and Daelemans, 2011), but the question ha...

متن کامل

An experiment in authorship attribution

This paper reports an experiment in authorship attribution that reveals considerable authorial structure in texts written by authors with very similar background and training, with genre and topic being strictly controlled for. We interpret our results as supporting the hypothesis that authors have ’textual fingerprints’, at least for texts produced by authors who are not consciously changing t...

متن کامل

Social Media Writing Style Fingerprint

We present our approach for computer-aided social media text authorship attribution based on recent advances in short text authorship verification. We use various natural language techniques to create word-level and character-level models that act as hidden layers to simulate a simple neural network. The choice of word-level and character-level models in each layer was informed through validati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013